The Chordate Proteome History Database
نویسندگان
چکیده
The chordate proteome history database (http://ioda.univ-provence.fr) comprises some 20,000 evolutionary analyses of proteins from chordate species. Our main objective was to characterize and study the evolutionary histories of the chordate proteome, and in particular to detect genomic events and automatic functional searches. Firstly, phylogenetic analyses based on high quality multiple sequence alignments and a robust phylogenetic pipeline were performed for the whole protein and for each individual domain. Novel approaches were developed to identify orthologs/paralogs, and predict gene duplication/gain/loss events and the occurrence of new protein architectures (domain gains, losses and shuffling). These important genetic events were localized on the phylogenetic trees and on the genomic sequence. Secondly, the phylogenetic trees were enhanced by the creation of phylogroups, whereby groups of orthologous sequences created using OrthoMCL were corrected based on the phylogenetic trees; gene family size and gene gain/loss in a given lineage could be deduced from the phylogroups. For each ortholog group obtained from the phylogenetic or the phylogroup analysis, functional information and expression data can be retrieved. Database searches can be performed easily using biological objects: protein identifier, keyword or domain, but can also be based on events, eg, domain exchange events can be retrieved. To our knowledge, this is the first database that links group clustering, phylogeny and automatic functional searches along with the detection of important events occurring during genome evolution, such as the appearance of a new domain architecture.
منابع مشابه
I-3: Human Y Chromosome Proteome Project 2012 Update
The Human Genome Project has generated a blueprint for the approximately 20,300 gene-encoded proteins potentially active in any of 230 cell types that make up the human body (human proteome). However, based on the UniProtKB/Swiss-Prot database content, about 6000 of at the protein level; for many others, there is very little information related to protein function, abundance, subcellular locali...
متن کاملThesis abstract Development and analysis of a chordate and plant orthologous promoter database
متن کامل
The Adaptive Evolution Database (TAED): a phylogeny based tool for comparative genomics
From 138,662 embryophyte (higher plant) and 348,142 chordate genes, 4216 embryophyte and 15,452 chordate gene families were generated. For each of these gene families, multiple sequence alignments, phylogenetic trees, ratios of non-synonymous to synonymous nucleotide substitution rates (K(a)/K(s)), mappings from gene trees to the NCBI taxonomy and structural links to solved three-dimensional pr...
متن کاملNOPdb: Nucleolar Proteome Database
The Nucleolar Proteome Database (NOPdb) archives data on >700 proteins that were identified by multiple mass spectrometry (MS) analyses from highly purified preparations of human nucleoli, the most prominent nuclear organelle. Each protein entry is annotated with information about its corresponding gene, its domain structures and relevant protein homologues across species, as well as documentin...
متن کاملIt's a long way from amphioxus: descendants of the earliest chordate.
The origin of chordates and the consequent genesis of vertebrates were major events in natural history. The amphioxus (lancelet) is now recognised as the closest extant relative to the stem chordate and is the only living invertebrate that retains a vertebrate-like development and body plan through its lifespan, despite more than 500 million years of independent evolution from the stem vertebra...
متن کامل